{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<font color=\"red\" size=\"5\"><strong>!Important!</strong> </br>This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/automl-forecasting-task-energy-demand-advanced-mlflow.ipynb)).</font>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning\n",
        "_**Forecasting using the Energy Demand Dataset**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#introduction)\n",
        "1. [Setup](#setup)\n",
        "1. [Data and Forecasting Configurations](#data)\n",
        "1. [Train](#train)\n",
        "1. [Generate and Evaluate the Forecast](#forecast)\n",
        "\n",
        "Advanced Forecasting\n",
        "1. [Advanced Training](#advanced_training)\n",
        "1. [Advanced Results](#advanced_results)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Introduction<a id=\"introduction\"></a>\n",
        "\n",
        "In this example we use the associated New York City energy demand dataset to showcase how you can use AutoML for a simple forecasting problem and explore the results. The goal is predict the energy demand for the next 48 hours based on historic time-series data.\n",
        "\n",
        "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) first, if you haven't already, to establish your connection to the AzureML Workspace.\n",
        "\n",
        "In this notebook you will learn how to:\n",
        "1. Creating an Experiment using an existing Workspace\n",
        "1. Configure AutoML using 'AutoMLConfig'\n",
        "1. Train the model using AmlCompute\n",
        "1. Explore the engineered features and results\n",
        "1. Generate the forecast and compute the out-of-sample accuracy metrics\n",
        "1. Configuration and remote run of AutoML for a time-series model with lag and rolling window features\n",
        "1. Run and explore the forecast with lagging features"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Setup<a id=\"setup\"></a>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import json\n",
        "import logging\n",
        "\n",
        "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n",
        "from matplotlib import pyplot as plt\n",
        "import pandas as pd\n",
        "import numpy as np\n",
        "import warnings\n",
        "import os\n",
        "\n",
        "# Squash warning messages for cleaner output in the notebook\n",
        "warnings.showwarning = lambda *args, **kwargs: None\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core import Experiment, Workspace, Dataset\n",
        "from azureml.train.automl import AutoMLConfig\n",
        "from datetime import datetime"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This notebook is compatible with Azure ML SDK version 1.35.0 or later."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# choose a name for the run history container in the workspace\n",
        "experiment_name = \"automl-forecasting-energydemand\"\n",
        "\n",
        "# # project folder\n",
        "# project_folder = './sample_projects/automl-forecasting-energy-demand'\n",
        "\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output[\"Subscription ID\"] = ws.subscription_id\n",
        "output[\"Workspace\"] = ws.name\n",
        "output[\"Resource Group\"] = ws.resource_group\n",
        "output[\"Location\"] = ws.location\n",
        "output[\"Run History Name\"] = experiment_name\n",
        "output[\"SDK Version\"] = azureml.core.VERSION\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create or Attach existing AmlCompute\n",
        "A compute target is required to execute a remote Automated ML run. \n",
        "\n",
        "[Azure Machine Learning Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) is a managed-compute infrastructure that allows the user to easily create a single or multi-node compute. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "#### Creation of AmlCompute takes approximately 5 minutes. \n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your cluster.\n",
        "amlcompute_cluster_name = \"energy-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
        "    print(\"Found existing cluster, use it.\")\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=\"STANDARD_DS12_V2\", max_nodes=6\n",
        "    )\n",
        "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Data<a id=\"data\"></a>\n",
        "\n",
        "We will use energy consumption [data from New York City](http://mis.nyiso.com/public/P-58Blist.htm) for model training. The data is stored in a tabular format and includes energy demand and basic weather data at an hourly frequency. \n",
        "\n",
        "With Azure Machine Learning datasets you can keep a single copy of data in your storage, easily access data during model training, share data and collaborate with other users. Below, we will upload the datatset and create a [tabular dataset](https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/service/how-to-create-register-datasets#dataset-types) to be used training and prediction."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's set up what we know about the dataset.\n",
        "\n",
        "<b>Target column</b> is what we want to forecast.<br></br>\n",
        "<b>Time column</b> is the time axis along which to predict.\n",
        "\n",
        "The other columns, \"temp\" and \"precip\", are implicitly designated as features."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "target_column_name = \"demand\"\n",
        "time_column_name = \"timeStamp\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "dataset = Dataset.Tabular.from_delimited_files(\n",
        "    path=\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/nyc_energy.csv\"\n",
        ").with_timestamp_columns(fine_grain_timestamp=time_column_name)\n",
        "dataset.take(5).to_pandas_dataframe().reset_index(drop=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The NYC Energy dataset is missing energy demand values for all datetimes later than August 10th, 2017 5AM. Below, we trim the rows containing these missing values from the end of the dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Cut off the end of the dataset due to large number of nan values\n",
        "dataset = dataset.time_before(datetime(2017, 10, 10, 5))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Split the data into train and test sets"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The first split we make is into train and test sets. Note that we are splitting on time. Data before and including August 8th, 2017 5AM will be used for training, and data after will be used for testing."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# split into train based on time\n",
        "train = (\n",
        "    dataset.time_before(datetime(2017, 8, 8, 5), include_boundary=True)\n",
        "    .to_pandas_dataframe()\n",
        "    .reset_index(drop=True)\n",
        ")\n",
        "train.sort_values(time_column_name).tail(5)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# split into test based on time\n",
        "test = (\n",
        "    dataset.time_between(datetime(2017, 8, 8, 6), datetime(2017, 8, 10, 5))\n",
        "    .to_pandas_dataframe()\n",
        "    .reset_index(drop=True)\n",
        ")\n",
        "test.head(5)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# register the splitted train and test data in workspace storage\n",
        "from azureml.data.dataset_factory import TabularDatasetFactory\n",
        "\n",
        "datastore = ws.get_default_datastore()\n",
        "train_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    train, target=(datastore, \"dataset/\"), name=\"nyc_energy_train\"\n",
        ")\n",
        "test_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    test, target=(datastore, \"dataset/\"), name=\"nyc_energy_test\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Setting the maximum forecast horizon\n",
        "\n",
        "The forecast horizon is the number of periods into the future that the model should predict. It is generally recommend that users set forecast horizons to less than 100 time periods (i.e. less than 100 hours in the NYC energy example). Furthermore, **AutoML's memory use and computation time increase in proportion to the length of the horizon**, so consider carefully how this value is set. If a long horizon forecast really is necessary, consider aggregating the series to a coarser time scale. \n",
        "\n",
        "Learn more about forecast horizons in our [Auto-train a time-series forecast model](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-auto-train-forecast#configure-and-run-experiment) guide.\n",
        "\n",
        "In this example, we set the horizon to 48 hours."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "forecast_horizon = 48"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Forecasting Parameters\n",
        "To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**time_column_name**|The name of your time column.|\n",
        "|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n",
        "|**freq**|Forecast frequency. This optional parameter represents the period with which the forecast is desired, for example, daily, weekly, yearly, etc. Use this parameter for the correction of time series containing irregular data points or for padding of short time series. The frequency needs to be a pandas offset alias. Please refer to [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) for more information.\n",
        "|**cv_step_size**|Number of periods between two consecutive cross-validation folds. The default value is \"auto\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Train<a id=\"train\"></a>\n",
        "\n",
        "Instantiate an AutoMLConfig object. This config defines the settings and data used to run the experiment. We can provide extra configurations within 'automl_settings', for this forecasting task we add the forecasting parameters to hold all the additional forecasting parameters.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|forecasting|\n",
        "|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
        "|**blocked_models**|Models in blocked_models won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n",
        "|**experiment_timeout_hours**|Maximum amount of time in hours that the experiment take before it terminates.|\n",
        "|**training_data**|The training data to be used within the experiment.|\n",
        "|**label_column_name**|The name of the label column.|\n",
        "|**compute_target**|The remote compute for training.|\n",
        "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection. The default value is \"auto\", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or users could specify an integer value.\n",
        "|**enable_early_stopping**|Flag to enble early termination if the score is not improving in the short term.|\n",
        "|**forecasting_parameters**|A class holds all the forecasting related parameters.|\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blocked_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
        "\n",
        "forecasting_parameters = ForecastingParameters(\n",
        "    time_column_name=time_column_name,\n",
        "    forecast_horizon=forecast_horizon,\n",
        "    freq=\"H\",  # Set the forecast frequency to be hourly\n",
        "    cv_step_size=\"auto\",\n",
        ")\n",
        "\n",
        "automl_config = AutoMLConfig(\n",
        "    task=\"forecasting\",\n",
        "    primary_metric=\"normalized_root_mean_squared_error\",\n",
        "    blocked_models=[\"ExtremeRandomTrees\", \"AutoArima\", \"Prophet\"],\n",
        "    experiment_timeout_hours=0.3,\n",
        "    training_data=train_dataset,\n",
        "    label_column_name=target_column_name,\n",
        "    compute_target=compute_target,\n",
        "    enable_early_stopping=True,\n",
        "    n_cross_validations=\"auto\",  # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
        "    verbosity=logging.INFO,\n",
        "    forecasting_parameters=forecasting_parameters,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while.\n",
        "One may specify `show_output = True` to print currently running iterations to the console."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run = experiment.submit(automl_config, show_output=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Retrieve the Best Run details\n",
        "Below we retrieve the best Run object from among all the runs in the experiment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run = remote_run.get_best_child()\n",
        "best_run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Featurization\n",
        "We can look at the engineered feature names generated in time-series featurization via. the JSON file named 'engineered_feature_names.json' under the run outputs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Download the JSON file locally\n",
        "best_run.download_file(\n",
        "    \"outputs/engineered_feature_names.json\", \"engineered_feature_names.json\"\n",
        ")\n",
        "with open(\"engineered_feature_names.json\", \"r\") as f:\n",
        "    records = json.load(f)\n",
        "\n",
        "records"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View featurization summary\n",
        "You can also see what featurization steps were performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:\n",
        "\n",
        "+ Raw feature name\n",
        "+ Number of engineered features formed out of this raw feature\n",
        "+ Type detected\n",
        "+ If feature was dropped\n",
        "+ List of feature transformations for the raw feature"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Download the featurization summary JSON file locally\n",
        "best_run.download_file(\n",
        "    \"outputs/featurization_summary.json\", \"featurization_summary.json\"\n",
        ")\n",
        "\n",
        "# Render the JSON as a pandas DataFrame\n",
        "with open(\"featurization_summary.json\", \"r\") as f:\n",
        "    records = json.load(f)\n",
        "fs = pd.DataFrame.from_records(records)\n",
        "\n",
        "# View a summary of the featurization\n",
        "fs[\n",
        "    [\n",
        "        \"RawFeatureName\",\n",
        "        \"TypeDetected\",\n",
        "        \"Dropped\",\n",
        "        \"EngineeredFeatureCount\",\n",
        "        \"Transformations\",\n",
        "    ]\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Forecasting<a id=\"forecast\"></a>\n",
        "\n",
        "Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
        "\n",
        "The inference will run on a remote compute. In this example, it will re-use the training compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_experiment = Experiment(ws, experiment_name + \"_inference\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieving forecasts from the model\n",
        "We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from run_forecast import run_remote_inference\n",
        "\n",
        "remote_run_infer = run_remote_inference(\n",
        "    test_experiment=test_experiment,\n",
        "    compute_target=compute_target,\n",
        "    train_run=best_run,\n",
        "    test_dataset=test_dataset,\n",
        "    target_column_name=target_column_name,\n",
        ")\n",
        "remote_run_infer.wait_for_completion(show_output=False)\n",
        "\n",
        "# download the inference output file to the local machine\n",
        "remote_run_infer.download_file(\"outputs/predictions.csv\", \"predictions.csv\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Evaluate\n",
        "To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). For more metrics that can be used for evaluation after training, please see [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics), and [how to calculate residuals](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# load forecast data frame\n",
        "fcst_df = pd.read_csv(\"predictions.csv\", parse_dates=[time_column_name])\n",
        "fcst_df.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.shared import constants\n",
        "from azureml.automl.runtime.shared.score import scoring\n",
        "from matplotlib import pyplot as plt\n",
        "\n",
        "# use automl metrics module\n",
        "scores = scoring.score_regression(\n",
        "    y_test=fcst_df[target_column_name],\n",
        "    y_pred=fcst_df[\"predicted\"],\n",
        "    metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n",
        ")\n",
        "\n",
        "print(\"[Test data scores]\\n\")\n",
        "for key, value in scores.items():\n",
        "    print(\"{}:   {:.3f}\".format(key, value))\n",
        "\n",
        "# Plot outputs\n",
        "%matplotlib inline\n",
        "test_pred = plt.scatter(fcst_df[target_column_name], fcst_df[\"predicted\"], color=\"b\")\n",
        "test_test = plt.scatter(\n",
        "    fcst_df[target_column_name], fcst_df[target_column_name], color=\"g\"\n",
        ")\n",
        "plt.legend(\n",
        "    (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n",
        ")\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Advanced Training <a id=\"advanced_training\"></a>\n",
        "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Using lags and rolling window features\n",
        "Now we will configure the target lags, that is the previous values of the target variables, meaning the prediction is no longer horizon-less. We therefore must still specify the `forecast_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features.\n",
        "\n",
        "This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset.  You can choose to remove models from the blocked_models list but you may need to increase the iteration_timeout_minutes parameter value to get results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "advanced_forecasting_parameters = ForecastingParameters(\n",
        "    time_column_name=time_column_name,\n",
        "    forecast_horizon=forecast_horizon,\n",
        "    target_lags=12,\n",
        "    target_rolling_window_size=4,\n",
        "    cv_step_size=\"auto\",\n",
        ")\n",
        "\n",
        "automl_config = AutoMLConfig(\n",
        "    task=\"forecasting\",\n",
        "    primary_metric=\"normalized_root_mean_squared_error\",\n",
        "    blocked_models=[\n",
        "        \"ElasticNet\",\n",
        "        \"ExtremeRandomTrees\",\n",
        "        \"GradientBoosting\",\n",
        "        \"XGBoostRegressor\",\n",
        "        \"ExtremeRandomTrees\",\n",
        "        \"AutoArima\",\n",
        "        \"Prophet\",\n",
        "    ],  # These models are blocked for tutorial purposes, remove this for real use cases.\n",
        "    experiment_timeout_hours=0.3,\n",
        "    training_data=train_dataset,\n",
        "    label_column_name=target_column_name,\n",
        "    compute_target=compute_target,\n",
        "    enable_early_stopping=True,\n",
        "    n_cross_validations=\"auto\",  # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
        "    verbosity=logging.INFO,\n",
        "    forecasting_parameters=advanced_forecasting_parameters,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now start a new remote run, this time with lag and rolling window featurization. AutoML applies featurizations in the setup stage, prior to iterating over ML models. The full training set is featurized first, followed by featurization of each of the CV splits. Lag and rolling window features introduce additional complexity, so the run will take longer than in the previous example that lacked these featurizations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "advanced_remote_run = experiment.submit(automl_config, show_output=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "advanced_remote_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieve the Best Run details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run_lags = remote_run.get_best_child()\n",
        "best_run_lags"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Advanced Results<a id=\"advanced_results\"></a>\n",
        "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_experiment_advanced = Experiment(ws, experiment_name + \"_inference_advanced\")\n",
        "advanced_remote_run_infer = run_remote_inference(\n",
        "    test_experiment=test_experiment_advanced,\n",
        "    compute_target=compute_target,\n",
        "    train_run=best_run_lags,\n",
        "    test_dataset=test_dataset,\n",
        "    target_column_name=target_column_name,\n",
        "    inference_folder=\"./forecast_advanced\",\n",
        ")\n",
        "advanced_remote_run_infer.wait_for_completion(show_output=False)\n",
        "\n",
        "# download the inference output file to the local machine\n",
        "advanced_remote_run_infer.download_file(\n",
        "    \"outputs/predictions.csv\", \"predictions_advanced.csv\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "fcst_adv_df = pd.read_csv(\"predictions_advanced.csv\", parse_dates=[time_column_name])\n",
        "fcst_adv_df.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.shared import constants\n",
        "from azureml.automl.runtime.shared.score import scoring\n",
        "from matplotlib import pyplot as plt\n",
        "\n",
        "# use automl metrics module\n",
        "scores = scoring.score_regression(\n",
        "    y_test=fcst_adv_df[target_column_name],\n",
        "    y_pred=fcst_adv_df[\"predicted\"],\n",
        "    metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n",
        ")\n",
        "\n",
        "print(\"[Test data scores]\\n\")\n",
        "for key, value in scores.items():\n",
        "    print(\"{}:   {:.3f}\".format(key, value))\n",
        "\n",
        "# Plot outputs\n",
        "%matplotlib inline\n",
        "test_pred = plt.scatter(\n",
        "    fcst_adv_df[target_column_name], fcst_adv_df[\"predicted\"], color=\"b\"\n",
        ")\n",
        "test_test = plt.scatter(\n",
        "    fcst_adv_df[target_column_name], fcst_adv_df[target_column_name], color=\"g\"\n",
        ")\n",
        "plt.legend(\n",
        "    (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n",
        ")\n",
        "plt.show()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "jialiu"
      }
    ],
    "categories": [
      "how-to-use-azureml",
      "automated-machine-learning"
    ],
    "kernel_info": {
      "name": "python3"
    },
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    },
    "microsoft": {
      "ms_spell_check": {
        "ms_spell_check_language": "en"
      }
    },
    "nteract": {
      "version": "nteract-front-end@1.0.0"
    },
    "vscode": {
      "interpreter": {
        "hash": "6bd77c88278e012ef31757c15997a7bea8c943977c43d6909403c00ae11d43ca"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}